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Abstract 


The objective quantification of similarity between two mathematical structures constitutes a recurrent issue in 


science and technology. In the present work, we developed a principled approach that took the Kronecker’s delta 


function of two scalar values as the prototypical reference for similarity quantification and then derived for more 


yielding indices, three of which bound between 0 and 1. Generalizations of these indices to take into account the sign 


of the scalar values were then presented and developed to multisets, vectors, and functions in real spaces. Several 


important results have been obtained, including the interpretation of the Jaccard index as a yielding implementation 


of the Kronecker’s delta function. When generalized to real functions, the four described similarity indices become 


respective functionals, which can then be employed to obtain associated operations of convolution and correlation. 


‘Springtime, always plentiful of most diverse similarities.’ 
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1 Introduction 


It is often mentioned that one of the most important as- 
pects of science is the quantification of the physical world 
structures and phenomena, through respective measure- 
ments, so as to allow the development of objective theo- 
ries. While this is certainly true, there is a complementary 
aspect to taking measurements, and this concerns compar- 
ing and ordering the obtained quantifications so as to be 
able to take decisions on the most plausible models and 
explanations. 

For instance, given a model, its ability to account for 
the respectively modeled systems consists of comparing 
not only scalar values, but vectors, matrices, functions, 
as well as potentially any other mathematica structure. 
Indeed, the own validation of models and theories rely 
critically on several logical and quantitative indications 
of similarity. 

In addition to the critically important role of compar- 
isons in science, living beings also continuously rely on 
comparing entities, be then a received stimuli or more 
complex mental representations. For instance, humans 
are always comparing the weather today with those of 
other times. 

Comparing things in an objective quantitative manner 


involves the adoption of one or more measurements of 
similarity or distance between pairs of values, of which the 
Euclidean distance seems to have a particular importance. 

On subsequent scales, measurements are also required 
for comparing sets of objects, and here the cosine similar- 
ity and Jaccard indices (e.g. [1, 2]) are often employed. At 
an even higher level, we need approaches capable of quan- 
tifying the similarity between functions, in which case 
the inner product, and the respectively derived opera- 
tions of convolution and correlation, are often adopted 
(e.g. [3, 4, 5, 6]). 

Given that similarity and distance are intrinsically in- 
terrelated, including the fact that one can often be de- 
rived from the other, the present work will focus only on 
similarity measurements, with the obtained results being 
immediately extensible to distances. 

Interestingly, the Euclidian distance, cosine similarity 
and inner product all share a same aspect, which consists 
in being based on products between pairs of values. As 
such, these approaches can be said to have a second order 
nature (x.x = x”). However, there is a virtually infinite 
number of other possible distance and similarity measure- 
ments, including those based on minimum, maximum and 
absolute values. 

In the present work, we aim at developing a principled 
approach in which we start by contemplating the similar- 
ity between two scalar real values x and y, from which 
it is concluded that the Kronecker’s delta function pro- 
vides a prototypical reference. However, given that this 


approach is too strict, it becomes necessary to relax the 
Kronecker’s delta function criterion so as to obtain more 
yielding respective similarity indices. Four main possibil- 
ities are identified, three of which being suitably bound in 
the interval [0,1], being denominated s1, s2, s3, and s4. 

Then, by using concepts derived from [1, 2], we describe 
how these four indices can be generalized in order to pro- 
vide additional information about the relative alignment 
between the two compared scalar values, yielding 4 re- 
spective versions of the adopted indices. 

These indices are then further generalized, again by 
considering the results in [1, 2], to multisets (e.g. [7, 8, 
9, 10, 11, 12]), vectors, and real functions. Though re- 
spective extensions to other mathematical structures in- 
cluding matrices, graphs, and scalar and vector fields are 
analogous, we do not develop these possibilities in the 
present work. 

Two other similarity indices, namely the interiority 
(or homogeneity) and coincidence indices [1, 2, 13], are 
also presented in their generalized versions for functions. 
In particular, the coincidence index has been found to 
present enhanced performance in important tasks such 
as pattern recognition [13] as well as when extended to 
act as quantifiers of joint variation between random vari- 
ables [1, 14]. 

When generalized to real function spaces, the four in- 
dices become functionals and, as such, can be combined 
in several manners and also used to implement respective 
convolution and correlation binary operations between 
functions. 

Among the several interesting results obtained, we have 
that the indices proposed in [1], especially the coincidence 
and addition-based mset Jaccard indices, actually corre- 
sponds to the generalizations of the scalar indices sı and 
592. 

We start by deriving the four similarity index from 
the Kronecker’s delta function, and proceed by presenting 
their extension to negative values and further generaliza- 
tion to multisets, vectors, and functions. Generalizations 
of the interiority and continuity indices to multisets, vec- 
tors and functions possibly taking negative values are then 
presented, which is followed by the presentation of the 
employment of all considered indices to define respective 
convolutions and correlations. 


2 Pairwise Similarities in R 


Before proceeding in depth with any current study of dis- 
tances, it is important to state as objectively as possible 
what is being meant by similarity. Unlike the concept of 
distance, which is ubiquitously associated to the concept 
of Euclidean distance, there seems to be less consensus 


regarding what similarity means. 

In this work, we will understand similarity between two 
values x and y in the sense of identity between them. Fig- 
ure 1 illustrates the most strict approach to quantifying 
the similarity between any two real values x and y, which 
assigns 1 whenever x = y, and 0 otherwise. 











Figure 1: The most strict quantification of the similarity between 
two real values x and y, implemented via a similarity binary operator 
dx,y that corresponds to the Kronecker delta. A non-zero result is 
obtained only in case x = y. 


Mathematically, this strict similarity quantification cor- 
responds to a continuous Dirac delta comb function ôg, y- 

The problem with this approach evidently is that it is 
way too strict, so that it becomes necessary to provide 
means for implementing some tolerance in the quantifica- 
tion. 

The distance between any vector |x, y] and the line y = 
x can be readily expressed as: 


Observe that this function is not upper bound, i.e. all 
we can say is that 0 < d (p, û). 
A possible manner to bound this distance is by making: 





TEE |z — y| 
d(5 å) = — TT (2) 
max {|z], |y|} 
which now ensures that 0 < d(j,a) < 1. 
Having a distance measurement normalized in the in- 
terval [0,1] is of critical importance because it allows us 
to derive a respective similarity distance simply as: 





ek |z — y| 
sjan aE 3 
$B.) = 1 — naxe] l} 6) 
It can be verified that: 
lx-ylļ_ _ _ 2min {|z], yl} (4) 
max {|2'|, |y|} |z| + |y| 


which is a slightly more convenient manner to express 
this similarity, which will constitute one of the similarity 


index addressed in the present work: 


_ 2min {lel ul} 
siea) = (5) 


with 0 < sı(x,y) < 1. 
The average between |x| and |y| can now be replaced 
by max {|z|,||y|}, yielding another normalized similarity 
index: 
min {|2|, lyl} 
max {|2|, [yl } 


$2 (x, y) = (6) 
with 0 < so(a,y) < 1. 
Yet another possible modification of the similarity in- 
dex in Equation 5 can be obtained by considering the 
product of functions: 


ly : 
(max {]zl, iy} @ 


with 0 < s3(a,y) < 1. 
It is also interesting to consider the following unbound 
version of the index s3: 





s4 = |a||y| (8) 


with 0 < s4 < œ. 

Though other similarity indices can be derived in anal- 
ogous manner, the present work will focus on the three 
indices s1, S2 and s3 above. 

Now, it is interesting to realize that the above indices 
loose information about the relative signs of the involved 
quantities. While this feature is suitable, and even desired 
in some circumstances, it is important to have generaliza- 
tions of the three similarity indices derived above that can 
take into account the signs of the involved quantities. 

Consider the situations depicted in Figure 2. Here, we 
have the four situations which needed to be taken into 
account while generalizing the three adopted similarity 
indices to cope with negative values. 


The similarity sign should express whether the two po- 
sitions point toward the same direction, which case a pos- 
itive similarity could be expected, or it they oppose one 
another, yielding a respective negative similarity sign. 

The key to obtaining signed similarity consists in em- 
ploying the following functions: 


Sy = sign(x) 
Sy = sign(y) 


Say = sign(x) sign(y) 


We shall refer to the function sz, as the conjoint sign 
function. 
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Figure 2: The four main situations met when comparing two posi- 
tions x and y along the real line R. It is often interesting to take 
into account whether the positions point toward the same or oppo- 
site directions. 


We can now generalize the three adopted similarity in- 
dex to reflect the sign of the values x and y as: 
min {s,2, Syy} 
S1 = Sey —__,. a 

max {SrT, Syy} 
2min {s,2, Syy} 

Szt + SyY 

TY 

(max {ssx syy} 


2 
S4 = Sry(SrESyy) = SzyTY = TY 


$2 = Sry 





$3 = 


with —1 < s1, 59,53 < 1 and —o < s4 < œ. 

For simplicity’s sake, both the modulus and signed ver- 
sions of the three similarities will be henceforth referred 
to simply as s1, s2, and s3, as the context shall be enough 
to indicate how they are being applied. 

In the context of polynomials, the product of two values 
x and y represents a second degree operation. This opera- 
tion has an intrinsic characteristic in which the product of 
two numbers larger than one tend to increase steeply with 
the magnitude of the values. However, when two values 
with magnitude smaller than 1 are multiplied, the result- 
ing value is typically substantially reduced. This charac- 
teristic is a direct consequence of the non-bilinearity of 
the product operation. 

Figure 3 illustrates the four proposed similarity indices 
in the region bound by x € [—1,1] and y € [—1,1]. Both 
sı (a) and s2 (b) yields marked peaks with value 1 along 
the main diagonal, indicating the close relationship be- 
tween these two similarity indices and the Kronecker’s 
delta function. This diagonal peak is much less marked 
in the case of s3 (c), and virtually undistinguishable in 
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Figure 3: The values of the four proposed similarity indices in the 
region bound by x € [—1,1] and y € [—1,1]. The gray scale varies 
from 0 to 1. Observe the peak main diagonal in cases (a) and (b), 
which have a direct relationship with the Kronecker’s delta and can 
be understood as a respectively smoothed version. 


Interestingly, the results from sı to s4 can be under- 
stood as providing successively blurred versions of the 
Kronecker’s delta reference similarity functions. There- 
fore, more strict quantifications of similarity between two 
values x and y will be provided by sı and s2, while s3 and 
s4 represent particularly yielding alternatives. 


3 Multiset Similarities 


Now that we have developed a principled approach to 
quantifying the similarity between two real values x and y, 
it becomes possible to extend these indices to other math- 
ematical structures, including multisets, vectors, func- 
tions, etc. In this section we address the important sub- 
ject of quantifying the similarity between two multisets, 
which are henceforth referred to as msets. 
A multiset A can be represented as: 


A= {[a1,ma(a1)]; [a2,ma(a2)]5---5 [an,ma(an)] F 


where we have N elements a;, each with respective 
multiplicity m,(a;). The support of this multiset is 
SA = {a1, a2, sae ,an}. 

The union of two msets A and B sharing the same 


support is defined as: 


AUB = { [a1, max {ma4(a1), mp(a1)}]; 
[az, max {m,4(a2),mp(a2)}];...; 
[av, max {ma(an),ma(an)}] T (9) 


In case A and B do not share the same support, a 
respective support can be obtained for the mset union 
consisting of the set union of the respective mset supports. 

The intersection between two msets A and B sharing 
the same support is given as: 


AN B= { [a1, min {m4 (a1), mp(a1)}]; 
[a2, min {m4(a2),mp(a2)}];...; 
lan, min {ma (an), mg(an)}] } (10) 


Msets can be generalized to real multiplicities, includ- 
ing possibly negative values [1, 2]. 

Quantification indices of the elementwise similarity be- 
tween two multisets A and B can be immediately obtained 
by applying the four scalar similarity indices proposed in 
the previous section. 

For simplicity’s sake, we shall abbreviate m,(a;) as £i, 
and mg(a;) as yi, which then yields: 
min {82,0;, Sy, yi} 
max { Sr; Li, Sy: Yi} 
2min { Sr; Li, Sy Yi} 

Sx; Li F Sy, Yi 
TiYi 
(max {z;, uih? 


S4(£i, Yi) = Ziyi 





S1 (£i, Yi) = Sry: 





52 (oy Yi) = Sriyi 


$3(2i, Yi) = 


It is of particular interest to generalize the four indices 
to quantify the similarity between two multisets A and B, 
which can be done as: 

«1(A, B) = Yo; Siyi Min { Sg; Li, Sy, Yi} 
> MaX { Sz; Zi, Sy Yi} 
aadje 2X; Seriyi MIN {Sz Zi, Sy Yi} 
D [Sx Xi + Sy, Yi] 
X; (max {82,24 Sy yi? 
A 2) Tan 

The resulting index s; for msets is corresponds to the 
generalization of the Jaccard similarity index to negative 
values [1, 2] Jy, while the index s results identical to the 
also recently proposed addition-based mset Jaccard index 
(e.g. [1, 2]), i.e.: 


si(A, B) = Jn(A, B) (11) 











83(A, B) = 


Therefore, the developments above allowed a principled 
derivation of those recently introduced generalizations of 
the Jaccard similarity index (e.g. [15, 16]). 


It also follows from the above developments that the 
generalized Jaccard index can be understood as an imple- 
mentation of smoothed generalizations of the Kronecker’s 
delta function based similarity to scalars, msets, vectors 
and scalar fields. 


4 Vector Similarities 


Since vectors can be understood as particular cases of 
msets with support S = {1,2,..., N]}, the respective gen- 
eralization of the proposed similarity to this type of math- 
ematical structures is immediate. 

[x1, LQ,->+. 
,yn]. We then have: 


Let two vectors ¢ = en] and y = 


[15 Y2)--- 
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? X max {8x; £i, Sy, Yi} 

= 2 Sxiyi min {8x;,Xi, Sy, yi} 
S2\T, y) = ; 
> [Sz Xi oe Sy, Yi 

Dh (max {Sx £i, Sy,¥i}) 
D Ziyi 
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sa(Z,¥) = 


Obseve that the index s4 becomes identical to the inner 
product between the two vectors. 


5 Function Similarities 


The generalization of the similarity indices to real func- 
tions follows directly from the mset continuous represen- 
tation [1, 2]. 

Given two real-valued functions f(x) and g(x) with 
shared support S, we immediately have: 


E Ís Sfg Min {sf f, Sgg} dx 











si(f,g) = Js max {sp f, sgg} da 
_ 2 fs fo min {37 f, 89g} dx 
s2(f,9) = Jo Isp f + sgg] dx 
htsa 
83(f,9) Í; (max {sp f, Sqg})° dx 
sa(f,g) = isi 


Ifllg| 


where f(x) has been abbreviated as f, g(a) has been 
abbreviated as g, |f| = J, |f(«)|dx and |g| = J, |g(a)|dz. 


6 The Interiority and Coincidence 
Indices 


As shown in [1], the traditional Jaccard similarity index 
between two sets A and B is not capable of taking into 


account how much one of the sets is interior to the other. 
In order to compensate for this issue, a new similarity 
index, called coindicence index was proposed [1] as corre- 
sponding to the product between the traditional Jaccard 
index and the interiority (or homogeneity) index. 

The interiority index can be expressed as: 


ANB 


i?) Fa al a 


(12) 
where |A| and |B] are the cardinalities of sets A and B. 
So that the coincidence index results as: 


C(A, B) = I(A, B)J(A, B) (13) 


Where J(A, B) is the conventional Jaccard index. 

Both the interiority and coincidence indices can also be 
understood as corresponding to quantifications of simi- 
larities. As such, it becomes interesting to consider their 
generalizations to msets, vectors, and functions [1, 2, 17]. 

First, we consider the respective version of the interi- 
ority index allowing real multiplicities [1, 2, 13, 17]: 

In the case of msets, we have: 


Dies, min {eels Sy, Yi} 


1(A, B) = 
min eae Sx, Vi, ies, suyi} 


(14) 





(15) 


where S4 is the support restricted to the situations in 
which Sv;Sy, > 0. This restriction reflects the fact the 
understanding that it is impossible to have interiority be- 
tween two msets with all respective elements having op- 
posite sign multiplicities. Observe that, as a consequence, 
0< I(A,B)<1. 

In case the whole support is to be taken into account, 
which can be required in some circumstances such as when 
performing template matching [13], we can make: 


mMin {Sz Zi, Sy Yi} 











I(A, B) = 16 
(4B) = min {|A} [B1} e 
(17) 
In the case of vectors, we immediately have: 
ics, MIN {Sz, Zi, Sy, yi} 
I(z,9) = — Dies ” (18) 
min ome Sxi Vi, ies, suyi} 
(19) 
So that: 


And, for functions: 
Js, min {sp f, sgg} dx 


I(f,g) = min { fo, spfdx, Js, sagde} 





Implying: 


CUI =F DIF 9) =I, 9) si(f,9) (23) 


7 Similarity Convolutions and 


Correlations 


Each of the similarity indices generalized to the real space 
of functions corresponds to a valid functional. Now, it 
is possible to obtain respective convolutions and correla- 
tions. For instance, in the case of s4, we have the following 
respectively associated convolution: 





E Js Sfo min {sf f, sgg(y — x)} dx 
(fOg)sıly] = Js max {sf f, sgg(y — x)} dx 











(24) 





and correlation: 


oeb aiae ae 
S > Sg 





In the case of s4, we have: 


_ Ís sgg(y — x)dx 


which corresponds to a normalized version of the stan- 














dard convolution. 


8 Concluding Remarks 


The concept of similarity appears recurrently in science 
and technology, underlying a large number of concepts, 
operations, and properties. From the perspective of 
Hilbert spaces, the similarity is critically important as 
it is related to the concept of inner product on which 
those spaces are based. However, the quantification of 
similarities between mathematical entities also constitute 
an ubiquitous task in virtually every applied area, includ- 
ing but by no means limited to patter recognition, signal 
processing, and machine intelligence, to name but a few 
cases. 

In the present work, we developed a principled ap- 
proach in which the Kronecker’s delta function was taken 
as the prototypical reference for quantifying the similar- 
ity between two scalar values, and then developed more 
yielding versions involving the operations of minimum, 
maximum, sum and product, in addition to the sign func- 
tion. Four main indices were obtained, three of which 
are normalized in the interval [0,1], which were then ex- 
tended to respective signed versions capable of providing 
more information about the kind of similarity, yielding 
respective versions of these indices bound by the interval 
[—1, 1]. 


Then, relying on recent results regarding the extension 
of multisets to functions and other mathematical struc- 
tures |1, 2, 17], we were able to extend the four signed 
similarity indices to multisets, vectors, and then func- 
tions. The extension to other mathematical structures 
including scalar and vector fields can also be obtained in 
analogous manner. 

Several important results have been obtained. First, 
we have that the extensively applied Jaccard index re- 
lates directly to the similarity index sı, while the index s4 
let to the standard inner product functional and convolu- 
tion. Of particular interest is that the similarity function- 
als recently introduced in [1, 2, 13, 14] resulted naturally 
from the here reported developments. For instance, it has 
been possible to verify that the mset Jaccard index, when 
adapted to negative values, corresponds to the functional 
respective to the described index sı. In addition, the 
addition-based mset Jaccard index was shown to follow 
from the index s2. The index s4, which is unbound, led 
to the standard inner product and respectively associated 
convolution and correlation. 
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